AAAI.2018 - Multiagent Systems

Total: 18

#1 Dilated FCN for Multi-Agent 2D/3D Medical Image Registration [PDF] [Copy] [Kimi]

Authors: Shun Miao ; Sebastien Piat ; Peter Fischer ; Ahmet Tuysuzoglu ; Philip Mewes ; Tommaso Mansi ; Rui Liao

2D/3D image registration to align a 3D volume and 2D X-ray images is a challenging problem due to its ill-posed nature and various artifacts presented in 2D X-ray images. In this paper, we propose a multi-agent system with an auto attention mechanism for robust and efficient 2D/3D image registration. Specifically, an individual agent is trained with dilated Fully Convolutional Network (FCN) to perform registration in a Markov Decision Process (MDP) by observing a local region, and the final action is then taken based on the proposals from multiple agents and weighted by their corresponding confidence levels. The contributions of this paper are threefold. First, we formulate 2D/3D registration as a MDP with observations, actions, and rewards properly defined with respect to X-ray imaging systems. Second, to handle various artifacts in 2D X-ray images, multiple local agents are employed efficiently via FCN-based structures, and an auto attention mechanism is proposed to favor the proposals from regions with more reliable visual cues. Third, a dilated FCN-based training mechanism is proposed to significantly reduce the Degree of Freedom in the simulation of registration environment, and drastically improve training efficiency by an order of magnitude compared to standard CNN-based training method. We demonstrate that the proposed method achieves high robustness on both spine cone beam Computed Tomography data with a low signal-to-noise ratio and data from minimally invasive spine surgery where severe image artifacts and occlusions are presented due to metal screws and guide wires, outperforming other state-of-the-art methods (single agent-based and optimization-based) by a large margin.

#2 POMDP-Based Decision Making for Fast Event Handling in VANETs [PDF] [Copy] [Kimi]

Authors: Shuo Chen ; Athirai Irissappane ; Jie Zhang

Malicious vehicle agents broadcast fake information about traffic events and thereby undermine the benefits of vehicle-to-vehicle communication in vehicular ad-hoc networks (VANETs). Trust management schemes addressing this issue do not focus on effective/fast decision making in reacting to traffic events. We propose a Partially Observable Markov Decision Process (POMDP) based approach to balance the trade-off between information gathering and exploiting actions resulting in faster responses. Our model copes with malicious behavior by maintaining it as part of a small state space, thus is scalable for large VANETs. We also propose an algorithm to learn model parameters in a dynamic behavior setting. Experimental results demonstrate that our model can effectively balance the decision quality and response time while still being robust to sophisticated malicious attacks.

#3 Manipulative Elicitation — A New Attack on Elections with Incomplete Preferences [PDF] [Copy] [Kimi]

Author: Palash Dey

Lu and Boutilier proposed a novel approach based on "minimax regret" to use classical score based voting rules in the setting where preferences can be any partial (instead of complete) orders over the set of alternatives. We show here that such an approach is vulnerable to a new kind of manipulation which was not present in the classical (where preferences are complete orders) world of voting. We call this attack "manipulative elicitation." More specifically, it may be possible to (partially) elicit the preferences of the agents in a way that makes some distinguished alternative win the election who may not be a winner if we elicit every preference completely. More alarmingly, we show that the related computational task is polynomial time solvable for a large class of voting rules which includes all scoring rules, maximin, Copeland α for every α ∈ [0,1], simplified Bucklin voting rules, etc. We then show that introducing a parameter per pair of alternatives which specifies the minimum number of partial preferences where this pair of alternatives must be comparable makes the related computational task of manipulative elicitation NP-complete for all common voting rules including a class of scoring rules which includes the plurality, k-approval, k-veto, veto, and Borda voting rules, maximin, Copeland α for every α ∈ [0,1], and simplified Bucklin voting rules. Hence, in this work, we discover a fundamental vulnerability in using minimax regret based approach in partial preferential setting and propose a novel way to tackle it.

#4 Strategic Coalitions With Perfect Recall [PDF] [Copy] [Kimi]

Authors: Pavel Naumov ; Jia Tao

The paper proposes a bimodal logic that describes an interplay between distributed knowledge modality and coalition know-how modality. Unlike other similar systems, the one proposed here assumes perfect recall by all agents. Perfect recall is captured in the system by a single axiom. The main technical results are the soundness and the completeness theorems for the proposed logical system.

#5 An Ant-Based Algorithm to Solve Distributed Constraint Optimization Problems [PDF] [Copy] [Kimi]

Authors: Ziyu Chen ; Tengfei Wu ; Yanchen Deng ; Cheng Zhang

As an important population-based algorithm, ant colony optimization (ACO) has been successfully applied into various combinatorial optimization problems. However, much existing work in ACO focuses on solving centralized problems. In this paper, we present a novel algorithm that takes the power of ants to solve Distributed Constraint Optimization Problems (DCOPs), called ACO_DCOP. In ACO_DCOP, a new mechanism that captures local benefits is proposed to compute heuristic factors and a new method that considers the cost structure of DCOPs is proposed to compute pheromone deltas appropriately. Moreover, pipelining technique is introduced to make full use of the computational capacity and improve the efficiency. In our theoretical analysis, we prove that ACO_DCOP is an anytime algorithm. Our empirical evaluation indicates that ACO_DCOP is able to find solutions of equal or significantly higher quality than state-of-the-art DCOP algorithms.

#6 HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge [PDF] [Copy] [Kimi]

Authors: Yanhai Xiong ; Haipeng Chen ; Mengchen Zhao ; Bo An

It has been an open challenge for self-interested agents to make optimal sequential decisions in complex multiagent systems, where agents might achieve higher utility via collaboration. The Microsoft Malmo Collaborative AI Challenge (MCAC), which is designed to encourage research relating to various problems in Collaborative AI, takes the form of a Minecraft mini-game where players might work together to catch a pig or deviate from cooperation, for pursuing high scores to win the challenge. Various characteristics, such as complex interactions among agents, uncertainties, sequential decision making and limited learning trials all make it extremely challenging to find effective strategies. We present HogRider---the champion agent of MCAC in 2017 out of 81 teams from 26 countries. One key innovation of HogRider is a generalized agent type hypothesis framework to identify the behavior model of the other agents, which is demonstrated to be robust to observation uncertainty. On top of that, a second key innovation is a novel Q-learning approach to learn effective policies against each type of the collaborating agents. Various ideas are proposed to adapt traditional Q-learning to handle complexities in the challenge, including state-action abstraction to reduce problem scale, a warm start approach using human reasoning for addressing limited learning trials, and an active greedy strategy to balance exploitation-exploration. Challenge results show that HogRider outperforms all the other teams by a significant edge, in terms of both optimality and stability.

#7 Social Norms of Cooperation With Costly Reputation Building [PDF] [Copy] [Kimi]

Authors: Fernando Santos ; Jorge Pacheco ; Francisco Santos

Social norms regulate actions in artificial societies, steering collective behavior towards desirable states. In real societies, social norms can solve cooperation dilemmas, constituting a key ingredient in systems of indirect reciprocity: reputations of agents are assigned following social norms that identify their actions as good or bad. This, in turn, implies that agents can discriminate between the different actions of others and that the behaviors of each agent are known to the population at large. This is only possible if the agents report their interactions. Reporting constitutes, this way, a fundamental ingredient of indirect reciprocity, as in its absence cooperation in a multiagent system may collapse. Yet, in most studies to date, reporting is assumed to be cost-free, which collides with many life situations, where reporting can easily incur a cost (costly reputation building). Here we develop a new model of indirect reciprocity that allows reputation building to be costly. We show that only two norms can sustain cooperation under costly reputation building, a feature that requires agents to be able to anticipate the reporting intentions of their opponents, depending sensitively on both the cost of reporting and the accuracy level of reporting anticipation.

#8 Control Argumentation Frameworks [PDF] [Copy] [Kimi]

Authors: Yannis Dimopoulos ; Jean-Guy Mailly ; Pavlos Moraitis

Dynamics of argumentation is the family of techniques concerned with the evolution of an argumentation framework (AF), for instance to guarantee that a given set of arguments is accepted. This work proposes Control Argumentation Frameworks (CAFs), a new approach that generalizes existing techniques, namely normal extension enforcement, by accommodating the possibility of uncertainty in dynamic scenarios. A CAF is able to deal with situations where the exact set of arguments is unknown and subject to evolution, and the existence (or direction) of some attacks is also unknown. It can be used by an agent to ensure that a set of arguments is part of one (or every) extension whatever the actual set of arguments and attacks. A QBF encoding of reasoning with CAFs provides a computational mechanism for determining whether and how this goal can be reached. We also provide some results concerning soundness and completeness of the proposed encoding as well as complexity issues.

#9 Privacy-Preserving Policy Iteration for Decentralized POMDPs [PDF] [Copy] [Kimi]

Authors: Feng Wu ; Shlomo Zilberstein ; Xiaoping Chen

We propose the first privacy-preserving approach to address the privacy issues that arise in multi-agent planning problems modeled as a Dec-POMDP. Our solution is a distributed message-passing algorithm based on trials, where the agents' policies are optimized using the cross-entropy method. In our algorithm, the agents' private information is protected using a public-key homomorphic cryptosystem. We prove the correctness of our algorithm and analyze its complexity in terms of message passing and encryption/decryption operations. Furthermore, we analyze several privacy aspects of our algorithm and show that it can preserve the agent privacy of non-neighbors, model privacy, and decision privacy. Our experimental results on several common Dec-POMDP benchmark problems confirm the effectiveness of our approach.

#10 Maximizing Influence in an Unknown Social Network [PDF] [Copy] [Kimi]

Authors: Bryan Wilder ; Nicole Immorlica ; Eric Rice ; Milind Tambe

In many real world applications of influence maximization, practitioners intervene in a population whose social structure is initially unknown. This poses a multiagent systems challenge to act under uncertainty about how the agents are connected. We formalize this problem by introducing exploratory influence maximization, in which an algorithm queries individual network nodes (agents) to learn their links. The goal is to locate a seed set nearly as influential as the global optimum using very few queries. We show that this problem is intractable for general graphs. However, real world networks typically have community structure, where nodes are arranged in densely connected subgroups. We present the ARISEN algorithm, which leverages community structure to find an influential seed set. Experiments on real world networks of homeless youth, village populations in India, and others demonstrate ARISEN's strong empirical performance. To formally demonstrate how ARISEN exploits community structure, we prove an approximation guarantee for ARISEN on graphs drawn from the Stochastic Block Model.

#11 Decentralised Learning in Systems With Many, Many Strategic Agents [PDF] [Copy] [Kimi]

Authors: David Mguni ; Joel Jennings ; Enrique Munoz de Cote

Although multi-agent reinforcement learning can tackle systems of strategically interacting entities, it currently fails in scalability and lacks rigorous convergence guarantees. Crucially, learning in multi-agent systems can become intractable due to the explosion in the size of the state-action space as the number of agents increases. In this paper, we propose a method for computing closed-loop optimal policies in multi-agent systems that scales independently of the number of agents. This allows us to show, for the first time, successful convergence to optimal behaviour in systems with an unbounded number of interacting adaptive learners. Studying the asymptotic regime of N-player stochastic games, we devise a learning protocol that is guaranteed to converge to equilibrium policies even when the number of agents is extremely large. Our method is model-free and completely decentralised so that each agent need only observe its local state information and its realised rewards. We validate these theoretical results by showing convergence to Nash-equilibrium policies in applications from economics and control theory with thousands of strategically interacting agents.

#12 Multiagent Connected Path Planning: PSPACE-Completeness and How to Deal With It [PDF] [Copy] [Kimi]

Authors: Davide Tateo ; Jacopo Banfi ; Alessandro Riva ; Francesco Amigoni ; Andrea Bonarini

In the Multiagent Connected Path Planning problem (MCPP), a team of agents moving in a graph-represented environment must plan a set of start-goal joint paths which ensures global connectivity at each time step, under some communication model. The decision version of this problem asking for the existence of a plan that can be executed in at most a given number of steps is claimed to be NP-complete in the literature. The NP membership proof, however, is not detailed. In this paper, we show that, in fact, even deciding whether a feasible plan exists is a PSPACE-complete problem. Furthermore, we present three algorithms adopting different search paradigms, and we empirically show that they may efficiently obtain a feasible plan, if any exists, in different settings.

#13 Learning the Behavior of a Dynamical System Via a “20 Questions” Approach [PDF] [Copy] [Kimi]

Authors: Abhijin Adiga ; Chris Kuhlman ; Madhav Marathe ; Ravi S. S. ; Daniel Rosenkrantz ; Richard Stearns

Using a graphical discrete dynamical system to model a networked social system, the problem of inferring the behavior of the system can be formulated as the problem of learning the local functions of the dynamical system. We investigate the problem assuming an active form of interaction with the system through queries. We consider two classes of local functions (namely, symmetric and threshold functions) and two interaction modes, namely batch mode (where all the queries must be submitted together) and adaptive mode (where the set of queries submitted at a stage may rely on the answers received to previous queries). We develop complexity results and efficient heuristics that produce query sets under both query modes. We demonstrate the performance of our heuristics through experiments on over 20 well-known networks.

#14 Integrated Cooperation and Competition in Multi-Agent Decision-Making [PDF] [Copy] [Kimi]

Authors: Kyle Wray ; Akshat Kumar ; Shlomo Zilberstein

Observing that many real-world sequential decision problems are not purely cooperative or purely competitive, we propose a new model—cooperative-competitive process (CCP)—that can simultaneously encapsulate both cooperation and competition. First, we discuss how the CCP model bridges the gap between cooperative and competitive models. Next, we investigate a specific class of group-dominant CCPs, in which agents cooperate to achieve a common goal as their primary objective, while also pursuing individual goals as a secondary objective. We provide an approximate solution for this class of problems that leverages stochastic finite-state controllers. The model is grounded in two multi-robot meeting and box-pushing domains that are implemented in simulation and demonstrated on two real robots.

#15 Knowledge, Fairness, and Social Constraints [PDF] [Copy] [Kimi]

Authors: Haris Aziz ; Sylvain Bouveret ; Ioannis Caragiannis ; Ira Giagkousi ; Jérôme Lang

In the context of fair allocation of indivisible items, fairness concepts often compare the satisfaction of an agent to the satisfaction she would have from items that are not allocated to her: in particular, envy-freeness requires that no agent prefers the share of someone else to her own share. We argue that these notions could also be defined relative to the knowledge that an agent has on how the items that she does not receive are distributed among other agents. We define a family of epistemic notions of envy-freeness, parameterized by a social graph, where an agent observes the share of her neighbours but not of her non-neighbours. We also define an intermediate notion between envy-freeness and proportionality, also parameterized by a social graph. These weaker notions of envy-freeness are useful when seeking a fair allocation, since envy-freeness is often too strong. We position these notions with respect to known ones, thus revealing new rich hierarchies of fairness concepts. Finally, we present a very general framework that covers all the existing and many new fairness concepts.

#16 Dynamic Pricing for Reusable Resources in Competitive Market With Stochastic Demand [PDF] [Copy] [Kimi]

Authors: Jiang Rong ; Tao Qin ; Bo An

The market for selling reusable products (e.g., car rental, cloud services and network access resources) is growing rapidly over the last few years, where service providers maximize their revenues through setting optimal prices. While there has been lots of research on pricing optimization, existing works often ignore dynamic property of demand and the competition among providers. Thus, existing pricing solutions might be far from optimal in realistic markets. This paper provides the first study of service providers' dynamic pricing in consideration of market competition and makes three key contributions along this line. First, we propose a comprehensive model that takes into account the dynamic demand and interaction among providers, and formulate the optimal pricing policy in the competitive market as an equilibrium. Second, we propose an approximate Nash equilibrium to describe providers' behaviors, and design an efficient algorithm to compute the equilibrium which is guaranteed to converge. Third, we derive many properties of the model without any further constraints on demand functions, which can reduce the search space of policies in the algorithm. Finally, we conduct extensive experiments with different parameter settings, showing that the approximate equilibrium is very close to the Nash equilibrium and our proposed pricing policy outperforms existing strategies.

#17 Preallocation and Planning Under Stochastic Resource Constraints [PDF] [Copy] [Kimi]

Authors: Frits de Nijs ; Matthijs Spaan ; Mathijs de Weerdt

Resource constraints frequently complicate multi-agent planning problems. Existing algorithms for resource-constrained, multi-agent planning problems rely on the assumption that the constraints are deterministic. However, frequently resource constraints are themselves subject to uncertainty from external influences. Uncertainty about constraints is especially challenging when agents must execute in an environment where communication is unreliable, making on-line coordination difficult. In those cases, it is a significant challenge to find coordinated allocations at plan time depending on availability at run time. To address these limitations, we propose to extend algorithms for constrained multi-agent planning problems to handle stochastic resource constraints. We show how to factorize resource limit uncertainty and use this to develop novel algorithms to plan policies for stochastic constraints. We evaluate the algorithms on a search-and-rescue problem and on a power-constrained planning domain where the resource constraints are decided by nature. We show that plans taking into account all potential realizations of the constraint obtain significantly better utility than planning for the expectation, while causing fewer constraint violations.

#18 The Role of Data-Driven Priors in Multi-Agent Crowd Trajectory Estimation [PDF] [Copy] [Kimi]

Authors: Gang Qiao ; Sejong Yoon ; Mubbasir Kapadia ; Vladimir Pavlovic

Resource constraints frequently complicate multi-agent planning problems. Existing algorithms for resource-constrained, multi-agent planning problems rely on the assumption that the constraints are deterministic. However, frequently resource constraints are themselves subject to uncertainty from external influences. Uncertainty about constraints is especially challenging when agents must execute in an environment where communication is unreliable, making on-line coordination difficult. In those cases, it is a significant challenge to find coordinated allocations at plan time depending on availability at run time. To address these limitations, we propose to extend algorithms for constrained multi-agent planning problems to handle stochastic resource constraints. We show how to factorize resource limit uncertainty and use this to develop novel algorithms to plan policies for stochastic constraints. We evaluate the algorithms on a search-and-rescue problem and on a power-constrained planning domain where the resource constraints are decided by nature. We show that plans taking into account all potential realizations of the constraint obtain significantly better utility than planning for the expectation, while causing fewer constraint violations.